<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title>定位NFS同步写数据时间慢的问题</title>
  <style type="text/css">code{white-space: pre;}</style>
  <style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
  margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
  </style>
  <link rel="stylesheet" href="doc/linux-style.css" type="text/css" />
</head>
<body>
<div id="header">
<h1 class="title">定位NFS同步写数据时间慢的问题</h1>
</div>
<div id="TOC">
<ul>
<li><a href="#定位nfs同步写数据时间慢的问题">定位NFS同步写数据时间慢的问题</a><ul>
<li><a href="#nfs_write.bt脚本代码">nfs_write.bt脚本代码</a></li>
<li><a href="#代码分析">代码分析</a></li>
</ul></li>
</ul>
</div>
<h1 id="定位nfs同步写数据时间慢的问题">定位NFS同步写数据时间慢的问题</h1>
<h2 id="nfs_write.bt脚本代码">nfs_write.bt脚本代码</h2>
<div class="sourceCode"><pre class="sourceCode c"><code class="sourceCode c"><span class="pp">#include</span><span class="im">&lt;linux/fs.h&gt;</span>
<span class="pp">#include</span><span class="im">&lt;linux/types.h&gt;</span>
<span class="pp">#include</span><span class="im">&lt;linux/uio.h&gt;</span>
BEGIN {
    printf(<span class="st">&quot;Tracing nfs_file_write execution time... Hit Ctrl-C to end.</span><span class="sc">\n</span><span class="st">&quot;</span>);
}

kprobe:nfs_file_write
{
    $iocb = (<span class="kw">struct</span> kiocb *)arg0;
    $file = $iocb-&gt;ki_filp;
    $dentry = $file-&gt;f_path.dentry;
    $from = (<span class="kw">struct</span> iov_iter *)arg1;
    
    
    @filename_ptr[tid] = $dentry-&gt;d_name.name;  <span class="co">// 存储文件名指针</span>
    @inode[tid] = $file-&gt;f_inode-&gt;i_ino;        <span class="co">// 存储 inode</span>
    @write_offset = $iocb-&gt;ki_pos;
    @write_length = $from-&gt;count;
    @start_time[tid] = nsecs;                   <span class="co">// 记录起始时间</span>
    printf(<span class="st">&quot;%s, name:%s, inode:%d, len:%d, offset:%d, time:%llu</span><span class="sc">\n</span><span class="st">&quot;</span>,
    probe,str($dentry-&gt;d_name.name),$file-&gt;f_inode-&gt;i_ino,@write_length,@write_offset,nsecs);
}

kretprobe:nfs_file_write
{
    $duration = nsecs - @start_time[tid];
    $name_ptr = @filename_ptr[tid];
    $ino = @inode[tid];
    
    printf(<span class="st">&quot;nfs_file_write - File: %-20s Inode: %-8lu Duration: %-12d ns</span><span class="sc">\n</span><span class="st">&quot;</span>,
        str($name_ptr), $ino, $duration);

    delete(@filename_ptr[tid]);
    delete(@inode[tid]);
    delete(@start_time[tid]);
}

END {
    clear(@filename_ptr);
    clear(@inode);
    clear(@start_time);
}</code></pre></div>
<p>NFS 的写操作本质上是 客户端与服务器之间的 RPC（Remote Procedure Call）通信过程。当你在客户端调用 nfs_file_write 时，NFS 客户端会将操作封装为 RPC 请求，通过 TCP/IP 网络发送到服务器，服务器处理后再返回响应。整个过程涉及多个阶段，任何一个阶段的延迟都可能导致整体写入性能下降。</p>
<p>需要统计4.19内核上同步写操作各个阶段所花的时间，nfsv3挂载选项:</p>
<p><code>(rw,relatime,sync,vers=3,rsize=262144,wsize=262144,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=xx.xx.xx.xx,mountvers=3,mountport=2050,mountproto=tcp,local_lock=none,addr=xx.xx.xx.xx)</code></p>
<p>通过man nfs查看sync挂载选项的解释:</p>
<p>NFS 客户端对 sync 挂载选项的处理方式与某些其他文件系统不同（参见 mount(8) 中对通用 sync 与 async 挂载选项的描述）。如果既未指定 sync，也未指定 async（或显式指定了 async），NFS 客户端会将应用程序的写入操作延迟发送到服务器，直到发生以下任一情况：</p>
<p>系统内存压力迫使内核回收内存资源。</p>
<p>应用程序通过 sync(2)、msync(2) 或 fsync(3) 显式刷新文件数据。</p>
<p>应用程序通过 close(2) 关闭文件。</p>
<p>文件通过 fcntl(2) 加锁或解锁。</p>
<p>换言之，在正常情况下，应用程序写入的数据可能不会立即出现在托管该文件的服务器上。</p>
<p>如果在挂载点上指定了 sync 选项，则对该挂载点上的文件进行任何写操作的系统调用都会在返回用户空间之前，将数据刷新到服务器。这能在多个客户端之间提供更强的一致性保障，但会显著降低性能。</p>
<p>应用程序也可使用 O_SYNC 打开标志，在不使用 sync 挂载选项的情况下，对单个文件的写操作强制立即发送到服务器。</p>
<h2 id="代码分析">代码分析</h2>
<div class="sourceCode"><pre class="sourceCode c"><code class="sourceCode c">write
  ksys_write
    vfs_write
      __vfs_write
        new_sync_write
          nfs_file_write
            generic_write_sync <span class="co">// 同步写执行到这里</span>
              vfs_fsync_range
                nfs_file_fsync <span class="co">// file-&gt;f_op-&gt;fsync</span>
                  filemap_write_and_wait_range
                    __filemap_fdatawrite_range
                      do_writepages
                        nfs_writepages
                          nfs_pageio_complete
                            nfs_pageio_doio
                              nfs_generic_pg_pgios
                                nfs_initiate_pgio
                                  .rpc_message = &amp;msg,
                                  .flags = RPC_TASK_ASYNC <span class="co">// 异步rpc</span>
                                  nfs_initiate_write <span class="co">// hdr-&gt;rw_ops-&gt;rw_initiate</span>
                                    nfs3_proc_write_setup
                                      msg-&gt;rpc_proc = &amp;nfs3_procedures[NFS3PROC_WRITE]
                                  rpc_run_task
                                    rpc_new_task
                                      rpc_init_task
                                    rpc_task_set_rpc_message <span class="co">// 设置tk_msg的值，NFS3PROC_WRITE和task关联上了</span>
                                    rpc_call_start
                                      task-&gt;tk_action = call_start
                                    rpc_execute</code></pre></div>
<p>分析代码可知：在使用<code>nfs sync</code>操作的时候，nfs调用rpc层有一个属性是设置结构体<code>msg</code>下的<code>rpc_proc</code>的值，然后再经过<code>rpc_task_set_rpc_message</code>将<code>tk_msg</code>设置，这样在过滤写操作的时候，就可以通过判断<code>tk_msg-&gt;msg-&gt;rpc_proc</code>是否等于<code>NFS3PROC_WRITE</code>,来过滤其他操作，只保留写操作。</p>
<p>对函数进行修改后，最后运行脚本<a href="https://gitee.com/youcunhua/blog/blob/master/work/nfs/stat-nfsv3-sync-write-time/nfs_write_fixed.bt">bpftrace nfs_write.bt</a>,然后用nfs挂载文件，进行nfs的写入，脚本会打印出如下类似的信息。下面是两次nfs同步写操作时间慢的打印信息。 然后运行<a href="https://gitee.com/youcunhua/blog/blob/master/work/nfs/stat-nfsv3-sync-write-time/trace_nfs_write_fixed.sh">trace_nfs_write.sh</a>就可以将time数据转换成我们可读的时间戳。与wireshark的XID进行时间对比。</p>
<div class="sourceCode"><pre class="sourceCode bash"><code class="sourceCode bash"><span class="ex">kprobe</span>:nfs_file_write, name:xxxx inode:1697123520 len:354    offset:132777   time:39290423881392412
<span class="ex">call_transmit</span> start - XID:0xedc0ad03 OP:GETATTR      time:39290423881405002
<span class="ex">call_transmit</span> start - XID:0xedc0ad03 OP:GETATTR      time:39290423982453372
<span class="ex">call_transmit</span> start - XID:0xeec0ad03 OP:WRITE        time:39290423982494112
<span class="ex">call_transmit</span> start - XID:0xeec0ad03 OP:WRITE        time:39290424084932650
<span class="ex">nfs_file_write</span> - File: xxxx Inode: 1697123520 Duration: 0 s 203 ms 592 μs 488 ns

<span class="ex">kprobe</span>:nfs_file_write, name:xxxx inode:1697123520 len:353    offset:140530   time:39290435940709314
<span class="ex">call_transmit</span> start - XID:0xfac1ad03 OP:GETATTR      time:39290435940720794
<span class="ex">call_transmit</span> start - XID:0xfac1ad03 OP:GETATTR      time:39290436041499632
<span class="ex">call_transmit</span> start - XID:0xfbc1ad03 OP:WRITE        time:39290436041541542
<span class="ex">call_transmit</span> start - XID:0xfbc1ad03 OP:WRITE        time:39290436149645344
<span class="ex">nfs_file_write</span> - File: xxxx Inode: 1697123520 Duration: 0 s 209 ms 7 μs 571 ns</code></pre></div>
<p>上述部分打印的信息，就可以看到RPC层的XID、OP；文件的inode 、长度、偏移量，以及调用时间。 - RPC的XID：事务唯一标识符，用于关联同一 RPC 请求的所有事件 - RPC的OP：RPC 操作类型，区分不同操作类型的性能特征</p>
<p>计算得到耗时时间</p>
<div class="sourceCode"><pre class="sourceCode bash"><code class="sourceCode bash"><span class="ex">call_transmit</span> -<span class="op">&gt;</span> call_decode : 104ms
<span class="ex">nfs_file_write</span> -<span class="op">&gt;</span> rpc_execute : 100ms</code></pre></div>
<p>然后通过抓包日志过滤：xid: rpc.xid==0xxxxxxxxx</p>
<p>可以看出XID为0xeec0ad03这个文件需要100ms去调用getattr。</p>
<p>耗时请求高是因为需要先调去nfs_file_write中的getattr，去获取文件属性，然后继续去查看nfs_file_write的调用栈:</p>
<div class="sourceCode"><pre class="sourceCode c"><code class="sourceCode c">nfs_file_write
  nfs_revalidate_file_size
    __nfs_revalidate_inode
      NFS_PROTO(inode)-&gt;getattr <span class="co">//这里调用了getattr</span>
  nfs_revalidate_mapping
    __nfs_revalidate_inode
      NFS_PROTO(inode)-&gt;getattr <span class="co">//这里调用了getattr</span>
  generic_write_sync <span class="co">// 同步写执行到这里</span>
    vfs_fsync_range
      nfs_file_fsync <span class="co">// file-&gt;f_op-&gt;fsync</span>
        filemap_write_and_wait_range
          __filemap_fdatawrite_range
            do_writepages
              nfs_writepages
                nfs_pageio_complete
                  nfs_pageio_doio
                    nfs_generic_pg_pgios
                      nfs_initiate_pgio
                        .rpc_message = &amp;msg,
                        .flags = RPC_TASK_ASYNC <span class="co">// 异步rpc</span>
                        nfs_initiate_write <span class="co">// hdr-&gt;rw_ops-&gt;rw_initiate</span></code></pre></div>
<p>根据数据分析得出结论：用户态写文件时间长是由于nfs服务器回复慢导致的。</p>
</body>
</html>
